Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LnD: investigation into restart submission report generation #7551

Merged
merged 4 commits into from
Jan 17, 2025

Conversation

colinbruce
Copy link
Contributor

@colinbruce colinbruce commented Dec 20, 2024

What

When CCMS submissions were restarted on the morning of 20 Dec the submissions restarted but the reports did not generate as expected.

This PR identified an issue where CCMS submissions have been paused during an evening downtime. Previously we had only seen 2 or 3 submissions maximum.

On this occasion there were 10 applications to restart. This turned out to be too many!

I recreated the situation, 10 paused applications, and turned the submissions back on. The first 5 (each sidekiq worker has 5 threads) requested a CCMS case reference and then tried to generate reports at the same time... the worker container ran out of memory and crashed. When a new pod was spawned, the next 5 started. This left all 10 applications trying to build reports and failing.

Working as a team we identified that Sidekiq Capsules should work for us, I restored a backup of the test branch with 10 paused applications and created a new report_creator queue with two threads. This allowed two applications to build reports simultaneously and successfully cleared the paused submissions

Checklist

Before you ask people to review this PR:

  • Tests and rubocop should be passing: bundle exec rake
  • Github should not be reporting conflicts; you should have recently run git rebase main.
  • The standards in the Git Workflow document on Confluence should be followed
  • There should be no unnecessary whitespace changes. These make diffs harder to read and conflicts more likely.
  • The PR description should say what you changed and why, with a link to the JIRA story.
  • You should have looked at the diff against main and ensured that nothing unexpected is included in your changes.
  • You should have checked that the commit messages say why the change was made.

@colinbruce colinbruce force-pushed the lnd/restart-submissions-test branch 3 times, most recently from ba220af to 26e98e2 Compare December 30, 2024 12:21
@colinbruce colinbruce added the ready for review Please review label Dec 30, 2024
@colinbruce colinbruce marked this pull request as ready for review December 30, 2024 14:39
@colinbruce colinbruce requested a review from a team as a code owner December 30, 2024 14:39
@colinbruce colinbruce force-pushed the lnd/restart-submissions-test branch 4 times, most recently from fdda8fe to 26916a2 Compare January 15, 2025 13:08
This allows us to check flow using open_search if an error
occurs during state transitions.  Rather than taking up DB
space we can check logs and record output if needed
This sets up a new capusle with a concurrency of 2,
this allows two applications to have their reports
created at the same time.  The current handling allows
5 and we saw crashes after a pause in CCMS submissions
@colinbruce colinbruce force-pushed the lnd/restart-submissions-test branch from 26916a2 to 56ebf23 Compare January 16, 2025 11:31
clarify calling class and make the output comma delimited for extraction to CSV if needed
@colinbruce colinbruce force-pushed the lnd/restart-submissions-test branch from 56ebf23 to 54042e2 Compare January 16, 2025 12:02
@colinbruce colinbruce added approved Approved by code reviewers and removed ready for review Please review labels Jan 17, 2025
@colinbruce colinbruce merged commit 08336be into main Jan 17, 2025
15 checks passed
@colinbruce colinbruce deleted the lnd/restart-submissions-test branch January 17, 2025 07:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Approved by code reviewers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants